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Construction of staples in lattice gauge theory on a parallel computer 

S.Hioki^* 

'^Department of Physics, Hiroshima University, Higashi-Hiroshima 724, Japan 

We propose a simple method to construct staples in lattice gauge theory with Wilson action on a parallel 
computer. This method can be applicable to any dimensional system and to any dimensional division without 
difhculty. Furthermore this requires rather small working area to realize gauge simulation on a parallel computer. 



1. Introduction 

Recently numerical simulations in lattice gauge 
theory become very important to investigate non- 
perturbative physics as well as to evaluate inter- 
esting quantities from first principle. In fact nu- 
merical studies grows with the growth of pow- 
erful computers like supercomputers and paral- 
lel computers. There are many commercial par- 
allel computers available now as well as many 
dedicated QCD parallel computers. In particular 
many lattice QCD(Quantum Chromo Dynamics) 
works have been performed on parallel comput- 
ers|l. 

However for programmers, the variety of par- 
allel computers prevents the portability of their 
code. Moreover the parallel code usually needs 
more working area coming from the overlapping 
with next processors than the code which runs 
on a single processor. These "portability of code" 
and "reducing of working area" become crucial to 
simulate a larger system and on a different plat- 
form. 

In this paper a simple method to overcome 
these problems in lattice QCD Monte Carlo Sim- 
ulations is proposed. The organization of this 
paper is as follows. In section 2, the notations 
and preparations which is necessary to latter pre- 
sentations is introduced. Actual construction of 
staples on a parallel computer is presented in sec- 
tion 3. The summary is given in section 4. 
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2. Notations and Preparations 

Consider d-dimensional pure gauge theory de- 
fined on a lattice with Wilson action. Let C/„^^ be 
a link variable of gauge field connecting between 
site n and n + /t, the staple X„.p concerning the 
hnk Un.u is 
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making the Wilson action S as 
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Let Nf^ {1 < fi < d) he a lattice size of /i di- 
rection, then each site x can be specified with its 
coordinate x = {xi, X2, ■■■,Xd) (0 < x^ < N^ — 1), 
and total volume is nu=i -^m- Using these coordi- 
nates we can define a global address of site. One 
simple way to do this is to specify global address 
j of site X as 

d p-i 
j(x) = l + xi+^x^]^Affc. (5) 

/j=2 fc=l 

Then we introduce " normal order" of sites such 
that sites are ordered with an increasing order of 
their global addresses. 

In order to realize even-odd (checkerboard) de- 
composition of sites which is necessary for paral- 
lelization we restrict on the case that each N^ is 



even. We then call a site x is even (odd) site if 

I]^=i Xf^ is even (odd). 

Next we put this lattice on a parallel com- 
puter which consists of Yl^^i Mf_i (1 < M^ < 
N^) processors, namely we consider the case that 
each processor is responsible for riu=i '^p ('^m — 
N^/Mfj_) sites. If Mfj, 7^ 1, ^ direction divedcd 
into Mf^ pieces and put on multi processors. On 
the other hand if M^ — I, fi direction is not di- 
vided into processors but put on a single proces- 
sor. In order to realize even-odd decomposition of 
sites also in each processor, Y[i=i '^i is restricted 
to be even. 

From now on we restrict ourselves on a proces- 
sor which is denoted by V, but the story discussed 
below is universal and applicable to all processors 
we think. For convenience we denote a neighbor- 
ing processor in ±/i direction as T' ± /t. 

Then we classify all sites in V into two groups; 
one is even site group Ge which consists of all even 
sites and the other is odd site group Go which 
consists of all odd sites. Since both groups have 
V (= Jli=i '^i/2) elements, we can number all 
elements of Ge (Go) in V from 1 to y in normal 
order introduced above. In terms of this num- 
bering we can define a local address of sites in 
P. 

Suppose a site n is in Ge (Go) and has a local 
address /. We refer the link variable C/„^^ as: 
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where i?;.^ (0;.^) represents a link variable at- 
taching a even (odd) site whose address is / and 
having a direction fx. 

Next we introduce a list vector Ve{l, IJ-) {vo{l, /^)) 
which points to a odd (even) site address located 
on n -|- /i as a function of a even (odd) site ad- 
dress /. If Mp = 1, a site n + jl is also in V. In 
this case Ve{l,fJ,) {vo{l,fJ,)) can be defined prop- 
erly. If M^ 7^ 1, however, there is a subgroup 

Be^n {Bo J) of Ge (Go) whose neighboring sites 
in +/i direction arc not in V but in T' + /t, i.e. 

neBi+JnGe ^ n + fi^Go (7) 

n e B[+^ n Ge => n + jleGo 
n e B^+J n Go => n + fK^Ge 



In this case Ve{l, fJ,) {vo{l, /i)) can not be defined 
properly in V and we must extend the definition. 
We introduce a new group A^e./j {No,^,) which con- 
sists of sites pointed hy n + jj, where n G -Be./j 
{n G Bo.f])- It is noted that elements of iVe^^ 
{No^f_i) are not in V, but in ■P-|-/x. Here we extend 
the site numbering in V to A^e,^ (-^o,^), namely 
we number elements of iVe^^ (-^o,;j) from V + 1 
to T^ + V/m^ in normal order, then we can define 
Ve{l, n) {vo{l, IJ')) which points to a odd (even) site 
address located on n + fi in No^fi [Ne^fi). 

Just like as BeJ{BoJ), we can introduce a 
subgroup BiJ {Bo J) of Ge {Go) whose neigh- 
boring sites in negative /i direction are not in "P 
but inV — fi. Since there are V/m^ sites in Be^ 
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{Bo,i_i), we can number these sites in Be,i_i {Bo J) 
from « = 1 to V/rrif^ in normal order. If a site n 

in Be.fJ {Bo J) has a local address / and specified 
by a number i discussed above, we can introduce 
a new function bo{i, fJ.){be{i, fJ.)) defined as: 

I = be{i,fi) in 5(7^), l<i<V/m^ (8) 
I = 6o(*,m) in Bi;J, l<i<V/m^. 

3. Construction of Staples 

In this section, we construct staples. For sim- 
plicity, we restrict on the case that n ^ Ge- 

3.1. positive u : XnJ,y 

First we construct Xnji, in two steps. We in- 
troduce temporary matrix T as: 



-^ — ^n,z^f-'n+P,j 



(+) 



then we construct Xn,iiv as 
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If Ml, ^ 1 and in case of n G Bl^J we do not 
have Un+o.^i, in V. Since the way to obtain infor- 
mation from other processors is machine depen- 
dent, in this paper we use send/receive syntax 
to perform inter-processor communications. We 
have to get C/„+i>./i from V + !>, in other words we 
have to send Un+o.ii to V — i>. So we first prepare 



the links that should be sent so that the links are 
orderd sequentially in a dimension. Second we 
send links prepared to 7^ — i). Third we receive 
links from V + v and store them to appropriate 
point of dimension. 

PREPARE 0^{V + i) ^ O^,{ho{i,v)) 
SEND TO r-v 0^,{V + i) 
RECEIVE FROM V + v 0^,{V + i) 

For latter convenience, we introduce a function 
SETLINK that performs above three operations. 
The syntax is: 

SETLINK(0^,6o,y,m^,j/). (11) 

Now T can be made by; 

T{l) = E,{l)0^,{vS,v)), 1<1<V. (12) 

Obviously SETLINK is not necessary if My = 1 . 
Next we prepare Us in eq. (noh . If M^ / 1 first 
we perform SETLINK: 

SETLINK(0^,6o,V^,m^,^). (13) 

Then we can construct X'^^^ as, 
xl^y^T{l)Ol{vS.^A). 1<1<V. (14) 



3.2. negative v : X 



(-) 



To calculate XnJ^ we use a technique. First 
construct Xn^fiy a.s n — i) are the starting points. 
Next we move staples to +v direction and obtain 
Xn'/lv Just like as Xnji, case, we do in two steps. 
Since n—i> are odd sites, temporal matrix T which 
is the products of first two matrix of the right 
hand site of eq.(3) can be construct as; 

T{l)^Ol{l)0,,{l), 1<1<V. (15) 

If Mf, ^ 1 do 

SETLINK(£;^,6e,T/,m^,/x). (16) 

We then get W{1) as; 

W{l)^Til)Ey{vo{l,^i)), 1<1<V. (17) 

which corresponds Xnjy but n — are the start- 
ing points instead of ns. Next we move W{1) to 
+iy direction; 



If My ^ 1 above +1^ operation require inter- 
processor communications. This can be done by 
sending T{n + i') (n e N^^y) to T' + i> and by re- 
ceiving them from to V ^ i> and storing them to 
T{n) {n€Bi;J). 

SEND TO V + i) T{V + i) 

RECEIVE FROM V - i> T{V + i) 
STORE T{be{i,v)) ^ T{V + i) 

This gives 4;^; 



X 



(-) 



T{vo{l,v))<-W{l), 1<1<V. 



(18) 



i,,y-T{l), 1<1<V. (19) 

Just like as SETLINK, it is useful to introduce 
a function SLIDEMATRIX that performs above 
three operations. The syntax is: 

SLIDEMATRIX(r,6e,V",m^,i^)- (20) 

4. Summary 

We have applied this method both on Fujitsu 
APIOOO at Fujitsu Parallel Computing Research 
Facilities and on Intel Paragon at Institute for 
Numerical Simulations and Applied Mathemat- 
ics in Hiroshima University. We have checked 
that this method can be applicable to any di- 
mensional system and to any dimensional divi- 
sion of the original system without any difficulty. 
Reduction of working area has been performed 
nicely compared with a original program which 
uses overlapping with next processors. We hope 
that this method will be a good help for beginners 
of parallel programming. 
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